8 research outputs found

    Typo handling in searching of Quran verse based on phonetic similarities

    Get PDF
    The Quran search system is a search system that was built to make it easier for Indonesians to find a verse with text by Indonesian pronunciation, this is a solution for users who have difficulty writing or typing Arabic characters. Quran search system with phonetic similarity can make it easier for Indonesian Muslims to find a particular verse.  Lafzi was one of the systems that developed the search, then Lafzi was further developed under the name Lafzi+. The Lafzi+ system can handle searches with typo queries but there are still fewer variations regarding typing error types. In this research Lafzi++, an improvement from previous development to handle typographical error types was carried out by applying typo correction using the autocomplete method to correct incorrect queries and Damerau Levenshtein distance to calculate the edit distance, so that the system can provide query suggestions when a user mistypes a search, either in the form of substitution, insertion, deletion, or transposition. Users can also search easily because they use Latin characters according to pronunciation in Indonesian. Based on the evaluation results it is known that the system can be better developed, this can be seen from the accuracy value in each query that is tested can surpass the accuracy of the previous system, by getting the highest recall of 96.20% and the highest Mean Average Precision (MAP) reaching 90.69%. The Lafzi++ system can improve the previous system

    頑健なIoTサービスのためのマルチエージェントシステム

    Get PDF
    京都大学0048新制・課程博士博士(情報学)甲第20028号情博第623号新制||情||108(附属図書館)33124京都大学大学院情報学研究科社会情報学専攻(主査)教授 石田 亨, 教授 多々納 裕一, 教授 山本 章博学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Meningkatkan Pengambilan Dokumen dengan Koreksi Ejaan untuk Hadits yang Lemah dan Palsu Terjemahan Bahasa Indonesia

    No full text
    Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.Hadits merupakan landasan dasar hukum Islam setelah Al-Quran, memiliki berbagai tingkat keaslian, yaitu: otentik (shahih), dapat diterima (hasan), lemah (dhaif), dan palsu (maudhu). Dua jenis hadis terakhir mungkin tidak berasal dari Nabi Muhammad SAW dan karenanya tidak boleh dipertimbangkan ketika menyimpulkan hukum Islam (syariah). Namun, banyak dari hadis-hadis ini yang dianggap sebagai hadis otentik di kalangan Muslim. Untuk dengan mudah membedakan hadits-hadits semacam itu, dokumen ini mengusulkan suatu metode untuk memverifikasi keaslian hadis dengan membandingkannya dengan koleksi hadis-hadis yang dibuat dalam bahasa Indonesia. Berbeda dengan penelitian terkait lainnya, metode yang diusulkan juga melakukan pemeriksaan ejaan untuk menangani kesalahan pengejaan yang sering didapati dalam hadis yang diterjemahkan ke dalam bahasa Indonesia di Web dan media sosial. Metode yang diusulkan menerapkan model ruang vektor dan symspell dalam kumpulan data Wikipedia, jejaring sosial dan portal berita. Hasil percobaan menunjukkan bahwa penggunaan pemeriksaan ejaan meningkatkan mean average precision dan recall menjadi masing-masing 81% (dari 73%) dan 89% (dari 80%)

    Topic Classification of Islamic Question and Answer Using Naïve Bayes and TF-IDF Method

    Get PDF
    Information spread through the internet is widely used by people to find anything. One of the most searched information on the internet is information related to Islamic religious knowledge. However, the large amount of information available from various sources makes it difficult for people to find the correct information. Previous researchers have researched this topic, but the dataset used only comes from one source. Therefore, in this study, a classification system for Islamic question and answer topics was built using the Naïve Bayes and TF-IDF methods. This study using 1000 question and answer article data taken from Islamic consultation websites, namely rumahfiqih.com and islamqa.info. The multi-class classification uses five categories which are manually labeled using the category classes on the website. From several test scenarios in this study, the Naïve Bayes classification method using TF-IDF (n-gram level) with a maximum feature of 1000 at a data separation ratio of 70:30 produces the highest accuracy of 81%. The 81% accuracy value was also generated by the SVM classification method, but the difference was in the SVM the highest accuracy value using TF-IDF (word level). It is expected that in the subsequent research will be used more website sources and the use of other classification and feature extraction methods with more optimal value than previous research

    Topic Classification of Quranic Verses in English Translation Using Word Centrality Measurement

    No full text
    Every Muslim in the world believes that the Quran is a miracle and the words of God (Kalamullah) revealed to the Prophet Muhammad SAW to be conveyed to humans. The Quran is used by humans as a guide in dealing with all problems in every aspect of life. To study the Quran, it is necessary to know what topic is being discussed in every single verse. With the help of technology, the verses of the Quran can be given topics automatically. This task is called multilabel classification where input data can be classified into one or more categories. This research aims to apply the multilabel classification to classify the topics of the Quranic verses in English translation into 10 topics using the Word Centrality measurement as the word weighting value. Then a comparison is made to the 4 classification methods, namely SVM, Naïve Bayes, KNN, and Decision Tree. The result of the centrality measurement shows that the word ‘Allah’ is the most important or the most central word of the whole document of the Quran with the scenario using stopword removal. Furthermore, the use of word centrality value as term weighting in feature extraction can improve the performance of the classification system.Every Muslim in the world believes that the Quran is a miracle and the words of God (Kalamullah) revealed to the Prophet Muhammad SAW to be conveyed to humans. The Quran is used by humans as a guide in dealing with all problems in every aspect of life. To study the Quran, it is necessary to know what topic is being discussed in every single verse. With the help of technology, the verses of the Quran can be given topics automatically. This task is called multilabel classification where input data can be classified into one or more categories. This research aims to apply the multilabel classification to classify the topics of the Quranic verses in English translation into 10 topics using the Word Centrality measurement as the word weighting value. Then a comparison is made to the 4 classification methods, namely SVM, Naïve Bayes, KNN, and Decision Tree. The result of the centrality measurement shows that the word ‘Allah’ is the most important or the most central word of the whole document of the Quran with the scenario using stopword removal. Furthermore, the use of word centrality value as term weighting in feature extraction can improve the performance of the classification system

    Analisis Teks Pelamar Untuk Klasifikasi Kepribadian Menggunakan Multinomial Naïve Bayes dan Decision Tree

    No full text
    Employees' qualities affect companies' performances and with a large number of applicants, it's difficult to find suitable applicants. To help with it, companies carry out psychological tests to know applicants' personalities, since personality's considered to have a relationship with work performances. But psychological testing requires a lot of effort, cost, and human resources. Thus with a system that can classify personalities through text can help reduce the effort needed. Similar studies carried out with the big five personalities as the theoretical basis and used one of the personality traits, namely using the k-NN method with 65% accuracy. Based on these studies, accuracy can improve by finding the best parameters using all of the big five personalities. This research is conducted based on the big five personality traits and related traits, namely consciousness and agreeableness. The data used is text data that's been labelled, pre-processed and feature selected. The clean text data is used to create a classification model using multinomial Naive Bayes and decision trees. There are 6 models built based on 3 work cultures, decision tree with an accuracy of 33%, 66%, 80%, and multinomial naïve Bayes with an accuracy of 83%, 50%, 60%, which resulted as better performance

    Identifying Emotion on Indonesian Tweets using Convolutional Neural Networks

    No full text
    especially with the advancement of deep learning methods for text classification. Despite some effort to identify emotion on Indonesian tweets, its performance evaluation results have not achieved acceptable numbers. To solve this problem, this paper implements a classification model using a convolutional neural network (CNN), which has demonstrated expected performance in text classification. To easily compare with the previous research, this classification is performed on the same dataset, which consists of 4,403 tweets in Indonesian that were labeled using five different emotion classes: anger, fear, joy, love, and sadness. The performance evaluation results achieve the precision, recall, and F1-score at respectively 90.1%, 90.3%, and 90.2%, while the highest accuracy achieves 89.8%. These results outperform previous research that classifies the same classification on the same dataset.especially with the advancement of deep learning methods for text classification. Despite some effort to identify emotion on Indonesian tweets, its performance evaluation results have not achieved acceptable numbers. To solve this problem, this paper implements a classification model using a convolutional neural network (CNN), which has demonstrated expected performance in text classification. To easily compare with the previous research, this classification is performed on the same dataset, which consists of 4,403 tweets in Indonesian that were labeled using five different emotion classes: anger, fear, joy, love, and sadness. The performance evaluation results achieve the precision, recall, and F1-score at respectively 90.1%, 90.3%, and 90.2%, while the highest accuracy achieves 89.8%. These results outperform previous research that classifies the same classification on the same dataset

    Classifying Quranic Verse Topics using Word Centrality Measure

    No full text
    Muslims believe that, as the speech of Allah, The Quran is a miracle that has specialties in itself. Some of the specialties that have studied are the regularities in the number of letters, words, vocabularies, etc. In the past, the early Islamic scholars identify these regularities manually, i.e. by counting the occurrence of each vocabulary by hand. This research tackles this problem by utilizing centrality in quranic verse topic classification. The goal of this research is to analyze the effect of The Quran word centrality measure on the topic classification of The Quran verses. To achieve this objective, the method of this research is constructing the Quran word graph, then the score of centralities included as one of the features in the verse topic classification. The effect of centrality is observed along with support vector machine (SVM) and naïve Bayes classifiers by performing two scenarios (with stopword and without stopword removal). The result shows that according to the centrality measure the word “الله” (Allah) is the most central in The Quran. The performance evaluation of the classification models shows that the use of centrality improves the hamming loss score from 0.43 to 0.21 on naïve Bayes classifier with stopword removal. Finally, both of classification method has a better performance in word graph that use stopword removal.  Muslims believe that, as the speech of Allah, The Quran is a miracle that has specialties in itself. Some of the specialties that have studied are the regularities in the number of letters, words, vocabularies, etc. In the past, the early Islamic scholars identify these regularities manually, i.e. by counting the occurrence of each vocabulary by hand. This research tackles this problem by utilizing centrality in quranic verse topic classification. The goal of this research is to analyze the effect of The Quran word centrality measure on the topic classification of The Quran verses. To achieve this objective, the method of this research is constructing the Quran word graph, then the score of centralities included as one of the features in the verse topic classification. The effect of centrality is observed along with support vector machine (SVM) and naïve Bayes classifiers by performing two scenarios (with stopword and without stopword removal). The result shows that according to the centrality measure the word “الله” (Allah) is the most central in The Quran. The performance evaluation of the classification models shows that the use of centrality improves the hamming loss score from 0.43 to 0.21 on naïve Bayes classifier with stopword removal. Finally, both of classification method has a better performance in word graph that use stopword removal
    corecore